ONNX Engine #10

infil00p · 2025-12-06T00:01:56Z

No description provided.

Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp, enabling GPU-accelerated inference for vision-language models with platform-specific execution providers (DirectML, CUDA, CoreML). Changes: - Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML) - Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models - Extend ModelManager with ONNX model download functionality from HuggingFace - Add Tauri commands for ONNX model operations (download, load, generate) - Update UI with ONNX Models tab in model selection modal - Add quantization selector (Q4, Q8, FP16) for ONNX models - Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure (ONNX files in onnx/ subdirectory, config/tokenizer at root) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Implement logic to detect model backend type and route to appropriate inference engine, enabling seamless switching between llama.cpp and ONNX Runtime backends. Changes: - Update model loading useEffect to check backend type - Route to load_onnx_model for ONNX models, load_model for llama.cpp - Disable audio capability check for ONNX models (not yet supported) - Add backend detection in handleSendMessage for inference routing - Convert image data appropriately for each backend: - RGB array for llama.cpp (existing) - JPEG bytes for ONNX Runtime (new) - Call generate_onnx_response for ONNX, generate_response for llama.cpp 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Update list_downloaded_models to include ONNX models by checking for _onnx_ pattern in directory names and verifying .onnx files exist. Add frontend model ID normalization to properly match ONNX models from HuggingFace repos with normalized names (handle slashes, dashes, case). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Integrate ONNX Runtime as an alternative inference backend alongside llama.cpp, enabling GPU-accelerated inference for vision-language models with platform-specific execution providers (DirectML, CUDA, CoreML). Changes: - Add ONNX Runtime dependencies with platform-specific features (DirectML/CUDA/CoreML) - Create vlm_onnx.rs module for ONNX inference engine supporting SmolVLM models - Extend ModelManager with ONNX model download functionality from HuggingFace - Add Tauri commands for ONNX model operations (download, load, generate) - Update UI with ONNX Models tab in model selection modal - Add quantization selector (Q4, Q8, FP16) for ONNX models - Configure downloads for SmolVLM2-256M-Video-Instruct with correct HF repo structure (ONNX files in onnx/ subdirectory, config/tokenizer at root) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Add SmolVLMImageProcessor with 4x4 grid splitting + global image (17 frames total, matching SmolVLM's expected input format) - Add proper prompt expansion with grid tokens (<fake_token_around_image>, <row_X_col_Y>, <global-img>) - Auto-detect layer count from decoder model inputs - Support multiple EOS tokens (2, <end_of_utterance>, </s>) - Fix pixel coordinate order to match model expectations - Add config.json path to model loading for HuggingFace config - Add test example for VLM inference 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Remove duplicate image = "0.25" from dev-dependencies that conflicted with image = "0.24" in main dependencies, causing E0464 compilation error. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

infil00p and others added 5 commits January 2, 2026 18:42

infil00p force-pushed the onnx_engine branch from 48b0703 to 60bc0ff Compare January 3, 2026 15:09

infil00p marked this pull request as ready for review January 4, 2026 16:28

infil00p merged commit 00dff51 into main Jan 4, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ONNX Engine #10

ONNX Engine #10

Uh oh!

infil00p commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ONNX Engine #10

ONNX Engine #10

Uh oh!

Conversation

infil00p commented Dec 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants